Last updated: 2023-12-06

Checks: 6 1

Knit directory: ILD_ASE_Xenium/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


The R Markdown is untracked by Git. To know which version of the R Markdown file created these results, you’ll want to first commit it to the Git repo. If you’re still working on the analysis, you can ignore this warning. When you’re finished, you can run wflow_publish to commit the R Markdown file and build the HTML.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20231206) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 210644b. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  analysis/Xenium_preprocessing.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


There are no past versions. Publish this analysis with wflow_publish() to start tracking its development.


Introduction

Processing Xenium data for clustering.

suppressPackageStartupMessages({library(cli)
                                library(Seurat)
                                library(SeuratObject)
                                library(SeuratDisk)
                                library(tidyverse)
                                library(tibble)
                                library(ggplot2)
                                library(ggpubr)
                                library(ggrepel)
                                library(workflowr)})
Loading Seurat v5 beta version 
To maintain compatibility with previous workflows, new Seurat objects will use the previous object structure by default
To use new Seurat v5 assays: Please run: options(Seurat.object.assay.version = 'v5')

Environment variables

set.seed(9999)
options(scipen = 99999)
options(ggrepel.max.overlaps = Inf)

# A function for better VlnPlot
BetterVlnPlot <- function(data, features, ylim = NA){
  VlnPlot(data, pt.size = 0, features = features, 
          group.by = "sample", y.max = ylim) + labs(x = "") + NoLegend()
}

Cell type markers

epithelial_features <- c("EGFR", "DUOX1", "NKX2-1", "AGER", "RTKN2", "NAPSA", "PGC", "SFTA2",
                         "SFTPC", "SFTPD", "KRT14", "KRT15", "KRT5", "KRT6A", "S100A2",
                         "TP63", "KRT17", "AGR3", "C20orf85", "FOXJ1", "GCLM", "DMBT1",
                         "EPCAM", "KRT18", "MGST1", "MMP7", "FOXI1", "MUC5B", "SCGB1A1", 
                         "SCGB3A2", "WFDC2", "ATF3", "KRT8", "SOX9", "SPINK1", "GKN2",
                         "MMP10", "SOX2", "CDH26", "TP73", "CFTR", "HES1", "PKM", "SOX4",
                         "NUCB2", "RNASE1", "SAA2", "AKR1C1", "AKR1C2", "BPIFA1", "CEACAM5",
                         "ERN2", "FCGBP", "GSR", "LTF", "CCNA1", "ICAM1", "ITGB1")
endothelial_features <- c("APLN", "CA4", "HEY1", "BMPR2", "CD34", "EPAS1", "FCN3", 
                          "GNG11", "PECAM1", "APLNR", "COL15A1", "PLVAP", "ACKR1",
                          "POSTN", "CLDN5", "RAMP2", "ZEB1", "HAS1", "KDR", "CDKN2A")
immune_features <- c("PPARG", "BANK1", "CD19", "CD79A", "LTB", "MS4A1", "TNFRSF13C", 
                     "CD86", "GZMB", "HLA-DRA", "CCR7", "CXCR4", "PTPRC", "TCL1A", 
                     "CD69", "CD4", "CD8A", "CD8B", "CD2", "CD28", "CD3D", "CD3E", 
                     "CD3G", "FOXP3", "GZMK", "TRAC", "ITM2C", "CD27", "CCL5", "LCK", 
                     "FABP4", "MARCO", "MCEMP1", "SPP1", "FCN1", "S100A12", "S100A8", 
                     "S100A9", "CCL22", "ITGAM", "NFKB1",  "IFIT2", "FGFBP2", "GNLY", 
                     "KLRB1", "KLRC1", "NKG7", "LILRA4", "BCL2", "CD79B", "CXCR5", 
                     "CXCL9", "GPR183", "HLA-DQA1", "KLRG1", "BCL2L11", "CD52", 
                     "SLC1A3", "TNFRSF9", "CTLA4",  "IL2RA", "LAG3", "PDCD1", "PDIA6", 
                     "PIM2", "IL7R", "LEF1", "FASLG", "HAVCR2", "ISG20", "CPA3", 
                     "KIT", "TPSAB1", "C1QC", "CD68", "MS4A7", "AIF1", "CD14", 
                     "FCGR3A", "FCER1G", "SLC25A37", "CD247", "GZMA", "IRF7")
mesenchymal_features <- c("MFAP5", "PI16", "SFRP2", "ELN", "FAP", "AXL", "LGR6", 
                          "COL1A1", "COL1A2", "COL3A1", "DCN", "FN1", "HAS2", 
                          "LUM", "MEG3", "SPARCL1", "CTHRC1")

Import data

data_dir <- "/tgen_labs/banovich/xenium_run_folders/ILDASE/20231102__213217__NB_ILDASE_A_B/"
id_list <- c(
  THD0026 = "output-XETG00048__0005070__THD0026__20231102__222410",
  TILD001 = "output-XETG00048__0005070__TILD001__20231102__222410",
  TILD062 = "output-XETG00048__0005070__TILD062__20231102__222410",
  TILD093 = "output-XETG00048__0005070__TILD093__20231102__222410",
  TILD103 = "output-XETG00048__0005070__TILD103__20231102__222410",
  TILD123 = "output-XETG00048__0005070__TILD123__20231102__222410",
  VUILD10 = "output-XETG00048__0005070__VUILD104__20231102__222410",
  VUILD48 = "output-XETG00048__0005070__VUILD48__20231102__222410",
  VUILD96 = "output-XETG00048__0005070__VUILD96__20231102__222410",
  TILD006 = "output-XETG00048__0005122__TILD006__20231102__222410",
  TILD030 = "output-XETG00048__0005122__TILD030__20231102__222410",
  TILD037 = "output-XETG00048__0005122__TILD037__20231102__222410",
  VUILD59 = "output-XETG00048__0005122__VUILD59__20231102__222410",
  VUILD91 = "output-XETG00048__0005122__VUILD91__20231102__222410",
  TILD010_VUHD115_TILD126 = "output-XETG00048__0005070__MERGED_TILD010_VUHD115_TILD126__20231102__222410",
  TILD028_TILD074 = "output-XETG00048__0005070__MERGED_TILD028_TILD074__20231102__222410",
  TILD041_TILD055 = "output-XETG00048__0005070__MERGED_TILD041_TILD055__20231102__222410",
  TILD049_TILD111_TILD080_VUHD113 = "output-XETG00048__0005122__MERGED_TILD049_TILD111_TILD080_VUHD113__20231102__222411",
  TILD059_TILD113__20231102 = "output-XETG00048__0005122__MERGED_TILD059_TILD113__20231102__222410",
  TILD109_THD0016__20231102 = "output-XETG00048__0005122__MERGED_TILD109_THD0016__20231102__222410",
  TILD136_VUILD102_TILD102_VUILD78 = "output-XETG00048__0005122__MERGED_TILD136_VUILD102_TILD102_VUILD78__20231102__222411")

# Get subdirectory names for obtaining file paths
subdirs <- unname(id_list)

# Get transcript counts and metadata
all_files <- list.files(file.path(data_dir, subdirs), full.names = TRUE)
h5_files <- all_files[grep(".h5", all_files)]
transcript_files <- all_files[grep("transcripts.csv.gz", all_files)]
meta_files <- all_files[grep("cells.csv.gz", all_files)]

# Get sample IDs
sample_ids <- names(id_list)

# Read in files
counts <- lapply(h5_files, Read10X_h5)
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
Genome matrix has multiple modalities, returning a list of matrices for this genome
transcripts <- lapply(transcript_files, function(XX) {
  read_csv(XX, col_types = c(transcript_id = "c", cell_id = "c")) })

metadata <- lapply(meta_files, function(XX) {
  tmp_meta <- read.delim(XX, sep = ",", colClasses = c(cell_id = "character"))
  rownames(tmp_meta) <- tmp_meta$cell_id
  tmp_meta })

# Rename files in lists
sample_ids <- unlist(lapply(str_split(meta_files, "__"), function(XX) { XX[5] }))
names(counts) <- sample_ids
names(transcripts) <- sample_ids
names(metadata) <- sample_ids

Get transcripts that only overlap the nucleus, create cell x gene matrix,

and count the number of blanks per cell

all_transcripts <- list()
nuc_transcripts <- list()
updated_metadata <- list()
for (sm in sample_ids) {
  message(paste("Getting nuclei counts for sample", sm))
  
  # Filter out low quality transcripts 
  all_transcripts[[sm]] <- transcripts[[sm]][transcripts[[sm]]$qv > 20, ]
  
  # Find transcripts that overlap a nucleus
  nuc_transcripts[[sm]] <- transcripts[[sm]][transcripts[[sm]]$overlaps_nucleus == "1", ]
  
  # Create cell x gene dataframe
  nuc_transcripts[[sm]] <- as.data.frame(table(nuc_transcripts[[sm]]$cell_id, 
                                               nuc_transcripts[[sm]]$feature_name))
  names(nuc_transcripts[[sm]]) <- c("cell_id", "feature_name", "Count")
  nuc_transcripts[[sm]] <- nuc_transcripts[[sm]] %>% 
    pivot_wider(names_from = "feature_name", values_from = "Count")
  
  # Get blanks count per nucleus
  blank_nuc_ids <- nuc_transcripts[[sm]]$cell_id
  blank_nuc_mat <- nuc_transcripts[[sm]][, grep("BLANK", 
                                                colnames(nuc_transcripts[[sm]]))]
  blank_nuc_counts <- as.data.frame(rowSums(blank_nuc_mat))
  blank_nuc_counts$cell_id <- blank_nuc_ids
  
  # Remove negative controls and convert to cell x gene matrix
  nuc_transcripts[[sm]] <- nuc_transcripts[[sm]][, grep("NegControl", 
                                                        colnames(nuc_transcripts[[sm]]), 
                                                        invert = TRUE)]
  nuc_transcripts[[sm]] <- nuc_transcripts[[sm]][, grep("BLANK", 
                                                        colnames(nuc_transcripts[[sm]]), 
                                                        invert = TRUE)]
  keep_cells <- nuc_transcripts[[sm]]$cell_id
  nuc_transcripts[[sm]] <- as.data.frame(nuc_transcripts[[sm]])
  rownames(nuc_transcripts[[sm]]) <- keep_cells
  nuc_transcripts[[sm]] <- nuc_transcripts[[sm]][, -1]
  nuc_transcripts[[sm]] <- as.matrix(t(nuc_transcripts[[sm]]))
  
  # Subset nuclear metadata to "cells" with transcripts that overlap nuclei
  updated_metadata[[sm]] <- metadata[[sm]][metadata[[sm]]$cell_id %in% keep_cells, ]
  
  # Add blank counts to metadata
  updated_metadata[[sm]] <- full_join(updated_metadata[[sm]], blank_nuc_counts,
                                      by = "cell_id")
  updated_metadata[[sm]] <- updated_metadata[[sm]] %>%
    rename(num.blank = `rowSums(blank_nuc_mat)`)
  rownames(updated_metadata[[sm]]) <- updated_metadata[[sm]]$cell_id
}
Getting nuclei counts for sample MERGED_TILD010_VUHD115_TILD126
Getting nuclei counts for sample MERGED_TILD028_TILD074
Getting nuclei counts for sample MERGED_TILD041_TILD055
Getting nuclei counts for sample THD0026
Getting nuclei counts for sample TILD001
Getting nuclei counts for sample TILD062
Getting nuclei counts for sample TILD093
Getting nuclei counts for sample TILD103
Getting nuclei counts for sample TILD123
Getting nuclei counts for sample VUILD104
Getting nuclei counts for sample VUILD48
Getting nuclei counts for sample VUILD96
Getting nuclei counts for sample MERGED_TILD049_TILD111_TILD080_VUHD113
Getting nuclei counts for sample MERGED_TILD059_TILD113
Getting nuclei counts for sample MERGED_TILD109_THD0016
Getting nuclei counts for sample MERGED_TILD136_VUILD102_TILD102_VUILD78
Getting nuclei counts for sample TILD006
Getting nuclei counts for sample TILD030
Getting nuclei counts for sample TILD037
Getting nuclei counts for sample VUILD59
Getting nuclei counts for sample VUILD91

Create Seurat objects

obj_list <- list()
obj_list <- sapply(sample_ids, function(XX) {
  # Create a Seurat object containing the RNA adata
  sobj <- CreateSeuratObject(counts = nuc_transcripts[[XX]], 
                             assay = "RNA")
  
  # Add metadata
  sobj <- AddMetaData(sobj, metadata = updated_metadata[[XX]])
  sobj$sample <- XX
  #sobj$tma <- tmas[[XX]]
  #sobj$run <- run_ids[[XX]]
  
  # Calculate percent blank
  sobj$percent.blank <- sobj$num.blank/(sobj$nCount_RNA + sobj$num.blank)*100
  
  # Remove cells with 0 nCount_RNA
  sobj <- subset(sobj, subset = nCount_RNA != 0)
  
  # Rename cells to add sample ID as prefix
  if (XX %in% c("MERGED_TILD010_VUHD115_TILD126",
                "MERGED_TILD028_TILD074",
                "MERGED_TILD041_TILD055",
                "MERGED_TILD049_TILD111_TILD080_VUHD113",
                "MERGED_TILD059_TILD113__20231102",
                "MERGED_TILD109_THD0016__20231102",
                "MERGED_TILD136_VUILD102_TILD102_VUILD78"))
    {
    
    position_xy <- cbind(sobj$x_centroid, sobj$y_centroid)
    row.names(position_xy) <- row.names(sobj@meta.data)
    colnames(position_xy) <- c("SP_1", "SP_2")
    sobj[["sp"]] <- CreateDimReducObject(embeddings = position_xy, key = "SP_",
                                         assay = DefaultAssay(sobj))
    obj_list[[XX]] <- sobj
    
  } else {
    sobj <- RenameCells(sobj, add.cell.id = XX)
    
    # Add spatial coordinates as dimension reduction objects
    #position_xy <- cbind(sobj$adj_x_centroid, sobj$adj_y_centroid)
    position_xy <- cbind(sobj$x_centroid, sobj$y_centroid)
    row.names(position_xy) <- row.names(sobj@meta.data)
    colnames(position_xy) <- c("SP_1", "SP_2")
    sobj[["sp"]] <- CreateDimReducObject(embeddings = position_xy, key = "SP_",
                                         assay = DefaultAssay(sobj))
    obj_list[[XX]] <- sobj
  }

})
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Counts matrix provided is not sparse. Creating V5 assay in Seurat Object.
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
#saveRDS(obj_list, "/scratch/hnatri/ILD/ILD_spatial_ASE/obj_list.rds")

# Get sample IDs
sample_ids <- names(obj_list)

Visualize

# Merge objects (cannot do spatial DimPlots for this)
merged_spatial_unfiltered <- merge(x = obj_list[[1]], y = obj_list[2:length(obj_list)])
Warning: Some cell names are duplicated across objects provided. Renaming to
enforce unique cell names.
# Add spatial dimension reduction object separately
position_xy <- cbind(merged_spatial_unfiltered$x_centroid,
                     merged_spatial_unfiltered$y_centroid)
row.names(position_xy) <- row.names(merged_spatial_unfiltered@meta.data)
colnames(position_xy) <- c("SP_1", "SP_2")
merged_spatial_unfiltered[["sp"]] <- CreateDimReducObject(
  embeddings = position_xy, key = "SP_", assay = DefaultAssay(merged_spatial_unfiltered))

DimPlot(merged_spatial_unfiltered, reduction = "sp", group.by = "sample", label = TRUE)
Rasterizing points since number of points exceeds 100,000.
To disable this behavior set `raster=FALSE`

#saveRDS(merged_spatial_unfiltered, "/scratch/hnatri/ILD/ILD_spatial_ASE/merged_spatial_unfiltered.rds")

Add cell level count data

# Get sample IDs
sample_ids <- names(obj_list)

cell_obj_list <- list()
cell_obj_list <- sapply(sample_ids, function(XX) {
  message(paste("Creating cell Seurat object for sample", XX))
  
  # Create a Seurat object containing the RNA cell information
  sobj <- CreateSeuratObject(counts = counts[[XX]]$`Gene Expression`,
                             assay = "RNA")
  rownames(metadata[[XX]]) <- metadata[[XX]]$cell_id
  sobj <- AddMetaData(sobj, metadata = metadata[[XX]])
  
  # Rename cells to add sample ID as prefix
  if (XX %in% c("MERGED_TILD010_VUHD115_TILD126",
                "MERGED_TILD028_TILD074",
                "MERGED_TILD041_TILD055",
                "MERGED_TILD049_TILD111_TILD080_VUHD113",
                "MERGED_TILD059_TILD113__20231102",
                "MERGED_TILD109_THD0016__20231102",
                "MERGED_TILD136_VUILD102_TILD102_VUILD78"))
  {
    
    cell_obj_list[[XX]] <- sobj
    
  } else {
    sobj <- RenameCells(sobj, add.cell.id = XX)
    cell_obj_list[[XX]] <- sobj
  }
})
Creating cell Seurat object for sample MERGED_TILD010_VUHD115_TILD126
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD028_TILD074
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD041_TILD055
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample THD0026
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD001
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD062
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD093
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD103
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD123
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample VUILD104
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample VUILD48
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample VUILD96
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD049_TILD111_TILD080_VUHD113
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD059_TILD113
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD109_THD0016
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample MERGED_TILD136_VUILD102_TILD102_VUILD78
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD006
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD030
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample TILD037
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample VUILD59
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
Creating cell Seurat object for sample VUILD91
Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')

Warning: Feature names cannot have underscores ('_'), replacing with dashes
('-')
# Merge cell information
cell_merged <- merge(cell_obj_list[[1]], y = cell_obj_list[2:length(cell_obj_list)])
Warning: Some cell names are duplicated across objects provided. Renaming to
enforce unique cell names.
# Add cell information to nuclei object
cell_count_matrix <- cell_merged@assays$RNA@counts
keep_cells <- colnames(merged_spatial_unfiltered)
cell_count_matrix <- cell_count_matrix[, keep_cells]
merged_spatial_unfiltered[["cell_RNA"]] <- CreateAssayObject(counts = cell_count_matrix)

#saveRDS(merged_spatial_unfiltered, "/scratch/hnatri/ILD/ILD_spatial_ASE/merged_spatial_unfiltered.rds")

QC

# Number of cells per sample before filtering
summary(as.factor(merged_spatial_unfiltered$sample))
         MERGED_TILD010_VUHD115_TILD126                  MERGED_TILD028_TILD074 
                                 185987                                   46338 
                 MERGED_TILD041_TILD055  MERGED_TILD049_TILD111_TILD080_VUHD113 
                                 189053                                  122598 
                 MERGED_TILD059_TILD113                  MERGED_TILD109_THD0016 
                                  29420                                   48243 
MERGED_TILD136_VUILD102_TILD102_VUILD78                                 THD0026 
                                 108104                                   42638 
                                TILD001                                 TILD006 
                                  57155                                   60377 
                                TILD030                                 TILD037 
                                  45762                                   38738 
                                TILD062                                 TILD093 
                                  54331                                   18431 
                                TILD103                                 TILD123 
                                  74704                                    5048 
                               VUILD104                                 VUILD48 
                                  50376                                   17374 
                                VUILD59                                 VUILD91 
                                   5670                                   11584 
                                VUILD96 
                                  59623 
merged_spatial_unfiltered@meta.data %>%
  ggplot(aes(y = sample)) +
  geom_bar()

# Percent.blank
merged_spatial_unfiltered@meta.data %>%
  ggplot(aes(x = percent.blank, fill = sample)) +
  geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
  theme_classic() +
  theme(title = element_text(color = "black"), 
        axis.text = element_text(color = "black")) +
  facet_wrap(~sample, scales = "free")

# nCount_RNA
merged_spatial_unfiltered@meta.data %>%
  ggplot(aes(x = nCount_RNA, fill = sample)) +
  geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
  theme_classic() +
  theme(title = element_text(color = "black"), 
        axis.text = element_text(color = "black")) +
  facet_wrap(~sample, scales = "free")

# nucleus_area
merged_spatial_unfiltered@meta.data %>%
  ggplot(aes(x = nucleus_area, fill = sample)) +
  geom_histogram(bins = 50, show.legend = FALSE, color = "black") +
  theme_classic() +
  theme(title = element_text(color = "black"), 
        axis.text = element_text(color = "black")) +
  facet_wrap(~sample, scales = "free")

merged_spatial_unfiltered$sample <- factor(merged_spatial_unfiltered$sample,
                                           levels = rev(sort(unique(merged_spatial_unfiltered$sample))))
BetterVlnPlot(merged_spatial_unfiltered, features = "percent.blank")

BetterVlnPlot(merged_spatial_unfiltered, features = "nCount_RNA")

BetterVlnPlot(merged_spatial_unfiltered, features = "nFeature_RNA")

BetterVlnPlot(merged_spatial_unfiltered, features = "nucleus_area")

# nCount_RNA vs. percent.blank
smoothScatter(merged_spatial_unfiltered@meta.data$percent.blank,
              log(merged_spatial_unfiltered@meta.data$nCount_RNA),
              cex = 0.5, pch = 16)
abline(v = 4, h = log(12), lty = "dashed", col = "black")
text(5, 5, col = "black", adj = c(0, -.1),
     "nCount_RNA >= 12 & percent.blank <= 4")

# nFeature_RNA vs. percent.blank
smoothScatter(merged_spatial_unfiltered@meta.data$percent.blank,
              log(merged_spatial_unfiltered@meta.data$nFeature_RNA),
              cex = 0.5, pch = 16)
abline(v = 4, h = log(10), lty = "dashed", col = "black")
text(5, 4, col = "black", adj = c(0, -.1),
     "nFeature_RNA >= 10 & percent.blank <= 4")

# nCount_RNA vs. nFeature_RNA
smoothScatter(log(merged_spatial_unfiltered$nCount_RNA),
              log(merged_spatial_unfiltered$nFeature_RNA),
              cex = 0.5, pch = 16)
abline(v = log(10), h = log(10), lty = "dashed", col = "black")
text(0.3, 4.6, col = "black", adj = c(0, -.1),
     "nCount_RNA >= 10 & nFeature_RNA >= 10")

# nCount RNA vs. nucleus_area
smoothScatter(merged_spatial_unfiltered$nucleus_area,
              log(merged_spatial_unfiltered$nCount_RNA),
              cex = 0.5, pch = 16)
abline(v = c(6, 80), h = log(10), lty = "dashed", col = "black")
text(120, 0.7, col = "black", adj = c(0, -.1),
     "nCount_RNA >= 10 & nucleus_area between 6-80")

# nFeature RNA vs. nucleus_area
smoothScatter(merged_spatial_unfiltered$nucleus_area,
              log(merged_spatial_unfiltered$nFeature_RNA),
              cex = 0.5, pch = 16)
abline(v = c(6, 80), h = log(10), lty = "dashed", col = "black")
text(120, 0.4, col = "black", adj = c(0, -.1),
     "nFeature_RNA >= 10 & & nucleus_area between 6-80")

min(merged_spatial_unfiltered$nucleus_area)
[1] 2.799688
max(merged_spatial_unfiltered$nucleus_area)
[1] 388.8856

Filter

# Filter merged and individual data
merged_spatial <- subset(merged_spatial_unfiltered,
                          subset = nCount_RNA >= 10 & nFeature_RNA >= 10 &
                            percent.blank <= 5 & 
                            nucleus_area >= 6 & nucleus_area <= 80)

# Number of nuclei before and after filtering
bf_cells <- table(merged_spatial_unfiltered$sample)
aft_cells <- table(merged_spatial$sample)
diff_cells <- bf_cells - aft_cells
prop_kept_cells <- round(aft_cells/bf_cells*100, 2)
prop_kept_cells

                                VUILD96                                 VUILD91 
                                  95.41                                   90.81 
                                VUILD59                                 VUILD48 
                                  97.11                                   89.84 
                               VUILD104                                 TILD123 
                                  94.26                                   69.24 
                                TILD103                                 TILD093 
                                  92.42                                   96.07 
                                TILD062                                 TILD037 
                                  85.68                                   90.24 
                                TILD030                                 TILD006 
                                  95.37                                   95.15 
                                TILD001                                 THD0026 
                                  86.88                                   94.38 
MERGED_TILD136_VUILD102_TILD102_VUILD78                  MERGED_TILD109_THD0016 
                                  94.01                                   95.27 
                 MERGED_TILD059_TILD113  MERGED_TILD049_TILD111_TILD080_VUHD113 
                                  91.89                                   96.01 
                 MERGED_TILD041_TILD055                  MERGED_TILD028_TILD074 
                                  92.99                                   81.23 
         MERGED_TILD010_VUHD115_TILD126 
                                  95.21 
# DimPlots of before and after for each sample
#DimPlotCompare <- function(sm){
#  bf_cells <- ncol(subset(merged_spatial_unfiltered, subset = sample == sm))
#  a <- DimPlot(subset(merged_spatial_unfiltered, subset = sample == sm),
#               reduction = "sp") + NoLegend() +
#    labs(title = paste0(sm, ", Unfiltered, ", bf_cells, " nuclei"))
#  
#  aft_cells <- ncol(subset(merged_spatial, subset = sample == sm))
#  b <- DimPlot(subset(merged_spatial, subset = sample == sm),
#               reduction = "sp") + NoLegend() +
#    labs(title = paste0(sm, ", Filtered, ", aft_cells, " nuclei"))
#  ggarrange(a,b)
#}

#saveRDS(merged_spatial, "/scratch/hnatri/ILD/ILD_spatial_ASE/merged_spatial_filtered.rds")

# To build on command line, run Rscript -e "rmarkdown::render('inputfile.Rmd')"

sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so;  LAPACK version 3.10.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] workflowr_1.7.1         ggrepel_0.9.3           ggpubr_0.6.0           
 [4] lubridate_1.9.2         forcats_1.0.0           stringr_1.5.0          
 [7] dplyr_1.1.2             purrr_1.0.1             readr_2.1.4            
[10] tidyr_1.3.0             tibble_3.2.1            ggplot2_3.4.2          
[13] tidyverse_2.0.0         SeuratDisk_0.0.0.9021   Seurat_4.9.9.9048      
[16] SeuratObject_4.9.9.9084 sp_1.6-1                cli_3.6.1              

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.14        jsonlite_1.8.5        
  [4] magrittr_2.0.3         spatstat.utils_3.0-3   farver_2.1.1          
  [7] rmarkdown_2.22         fs_1.6.2               vctrs_0.6.2           
 [10] ROCR_1.0-11            spatstat.explore_3.2-1 rstatix_0.7.2         
 [13] htmltools_0.5.5        broom_1.0.4            sass_0.4.6            
 [16] sctransform_0.3.5      parallelly_1.36.0      KernSmooth_2.23-21    
 [19] bslib_0.4.2            htmlwidgets_1.6.2      ica_1.0-3             
 [22] plyr_1.8.8             plotly_4.10.2          zoo_1.8-12            
 [25] cachem_1.0.8           whisker_0.4.1          igraph_1.4.3          
 [28] mime_0.12              lifecycle_1.0.3        pkgconfig_2.0.3       
 [31] Matrix_1.5-4.1         R6_2.5.1               fastmap_1.1.1         
 [34] fitdistrplus_1.1-11    future_1.32.0          shiny_1.7.4           
 [37] digest_0.6.31          colorspace_2.1-0       ps_1.7.5              
 [40] patchwork_1.1.2        rprojroot_2.0.3        tensor_1.5            
 [43] RSpectra_0.16-1        irlba_2.3.5.1          labeling_0.4.2        
 [46] progressr_0.13.0       timechange_0.2.0       fansi_1.0.4           
 [49] spatstat.sparse_3.0-1  httr_1.4.6             polyclip_1.10-4       
 [52] abind_1.4-5            compiler_4.3.0         bit64_4.0.5           
 [55] withr_2.5.0            backports_1.4.1        carData_3.0-5         
 [58] fastDummies_1.6.3      highr_0.10             ggsignif_0.6.4        
 [61] MASS_7.3-60            tools_4.3.0            lmtest_0.9-40         
 [64] httpuv_1.6.11          future.apply_1.11.0    goftest_1.2-3         
 [67] glue_1.6.2             callr_3.7.3            nlme_3.1-162          
 [70] promises_1.2.0.1       grid_4.3.0             getPass_0.2-2         
 [73] Rtsne_0.16             cluster_2.1.4          reshape2_1.4.4        
 [76] generics_0.1.3         hdf5r_1.3.8            gtable_0.3.3          
 [79] spatstat.data_3.0-1    tzdb_0.4.0             hms_1.1.3             
 [82] data.table_1.14.8      car_3.1-2              utf8_1.2.3            
 [85] spatstat.geom_3.2-1    RcppAnnoy_0.0.20       RANN_2.6.1            
 [88] pillar_1.9.0           vroom_1.6.3            spam_2.9-1            
 [91] RcppHNSW_0.4.1         later_1.3.1            splines_4.3.0         
 [94] lattice_0.21-8         survival_3.5-5         bit_4.0.5             
 [97] deldir_1.0-9           tidyselect_1.2.0       miniUI_0.1.1.1        
[100] pbapply_1.7-0          knitr_1.43             git2r_0.32.0          
[103] gridExtra_2.3          scattermore_1.1        xfun_0.39             
[106] matrixStats_1.0.0      stringi_1.7.12         lazyeval_0.2.2        
[109] yaml_2.3.7             evaluate_0.21          codetools_0.2-19      
[112] uwot_0.1.14            xtable_1.8-4           reticulate_1.29       
[115] processx_3.8.1         munsell_0.5.0          jquerylib_0.1.4       
[118] Rcpp_1.0.10            globals_0.16.2         spatstat.random_3.1-5 
[121] png_0.1-8              parallel_4.3.0         ellipsis_0.3.2        
[124] dotCall64_1.0-2        listenv_0.9.0          viridisLite_0.4.2     
[127] scales_1.2.1           ggridges_0.5.4         leiden_0.4.3          
[130] crayon_1.5.2           rlang_1.1.1            cowplot_1.1.1